Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Generating labeled training datasets has become a major bottleneck in Machine Learning (ML) pipelines. Active ML aims to address this issue by designing learning algorithms that automatically and adaptively select the most informative examples for labeling so that human time is not wasted labeling irrelevant, redundant, or trivial examples. This paper proposes a new approach to active ML with nonparametric or overparameterized models such as kernel methods and neural networks. In the context of binary classification, the new approach is shown to possess a variety of desirable properties that allow active learning algorithms to automatically and efficiently identify decision boundaries and data clusters.more » « less
- 
            Overparameterized machine learning models are often fit perfectly to training data, yet remarkably generalize well to new data. However, learning good models can require an enormous number of labeled training data. This challenge motivates the study of active learning algorithms that sequentially and adaptively request labels for “informative” examples for a large pool of unlabeled data. A maximin criterion was recently proposed for active learning specifically in the overparameterized and interpolating regime. Roughly speaking, the maximin criterion selects the example that is most difficult to interpolate, as measured by an appropriate norm on the interpolating func- tion. Data-dependent norms perform best empirically, exhibiting intriguing adaptivity to cluster structure within the data. The main contribution of this paper is to mathematically characterize this behavior. Our main results show that the maximin criterion based on data-dependent norms provably discovers clusters and also automatically generates labeled coverings of the dataset.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available